MEDB 5501, Module07

2024-08-27

Topics to be covered

  • What you will learn
    • Multiple linear regression
    • R code for multiple linear regression
    • Categorical independent variables
    • R code for categorical independent variables
    • Diagnostic plots and multicollinearity
    • R code for diagnostic plots and multicollinearity
    • Your homework

Model

  • \(Y_i=\beta_0+\beta_1 X_{1i}+\beta_2 X_{2i}+\epsilon_i\)
  • Least squares estimates: \(b_0,\ b_1,\ b_2\)

Interpretations

  • \(b_0\) is the estimated average value of Y when X1 and X2 both equal zero.
  • \(b_1\) is the estimated average change in Y
    • when \(X_1\) increases by one unit, and
    • \(X_2\) is held constant
  • \(b_2\) is the estimated average change in Y
    • when \(X_2\) increases by one unit, and
    • \(X_1\) is held constant

Unadjusted relationship between height and FEV

Relationship between height and FEV controlling at Age=3

Relationship between height and FEV controlling at Age=3

Relationship between height and FEV controlling at Age=4

Relationship between height and FEV controlling at Age=5

Relationship between height and FEV controlling at Age=6

Relationship between height and FEV controlling at Age=7

Relationship between height and FEV controlling at Age=8

Relationship between height and FEV controlling at Age=9

Relationship between height and FEV controlling at Age=10

Relationship between height and FEV controlling at Age=11

Relationship between height and FEV controlling at Age=12

Relationship between height and FEV controlling at Age=13

Relationship between height and FEV controlling at Age=14

Relationship between height and FEV controlling at Age=15

Relationship between height and FEV controlling at Age=16

Relationship between height and FEV controlling at Age=17

Relationship between height and FEV controlling at Age=18

Relationship between height and FEV controlling at Age=19

Unadjusted relationship between age and fev

Relationship between age and FEV controlling for height between 46 and 49.5

Relationship between age and FEV controlling for height between 50 and 53.5

Relationship between age and FEV controlling for height between 54 and 57.5

Relationship between age and FEV controlling for height between 58 and 61.5

Relationship between age and FEV controlling for height between 62 and 65.5

Relationship between age and FEV controlling for height between 66 and 69.5

Relationship between age and FEV controlling for height between 70 and 73.5

Break #1

  • What you have learned
    • Multiple linear regression
  • What’s coming next
    • R code for multiple linear regression

fev data dictionary, 1

data_dictionary: fev (.csv, sas7bdat, .sav, .txt)
copyright: |
  The author of the jse article holds the copyright, but does not list conditions under which it can be used. Individual use for educational purposes is probably permitted under the Fair Use provisions of U.S. Copyright laws.
description: |
  Forced Expiratory Volume (FEV) in children. The data was collected  in Boston in the 1970s.
additional_description: https://jse.amstat.org/v13n2/datasets.kahn.html

fev data dictionary, 2

download_url: https://www.amstat.org/publications/jse/datasets/fev.dat.txt
format:
  csv: comma delimited
  sas7bdat: proprietary (SAS)
  sav: proprietary (SPSS)
  txt: fixed width
varnames: not included
missing_value_code: not needed
size:
  rows: 654
  columns: 5

fev data dictionary, 3

age:
  scale: ratio
  range: positive integer
  unit: years
fev:
  label: Forced Expiratory Volume
  scale: ratio
  range: positive real
  unit: liters
ht:
  label: Height
  scale: positive real
  unit: inches

fev data dictionary, 4

sex:
  value:
    F: Female
    M: Male
smoke:
  value:
    'FALSE': Nonsmoker
    'TRUE': Smoker

simon-5501-07-template.qmd, 1

---
title: "Template for 5501-07 programming assignment"
author: "Steve Simon"
format: 
  html:
    embed-resources: true
date: 2024-09-25
---

There is a [data dictionary][dd] that provides more details about the data. The program was written by Steve Simon on 2024-09-02 and is placed in the public domain.

[dd]: https://github.com/pmean/datasets/blob/master/fev.yaml

simon-5501-07-template.qmd, 2

## Libraries

The tidyverse library is the only one you need for  this program.

```{r setup}
#| message: false
#| warning: false
library(tidyverse)
```

Break #2

  • What you have learned
    • R code for multiple linear regression
  • What’s coming next
    • Categorical independent variables

Categorical independent variables

Break #3

  • What you have learned
    • Categorical independent variables
  • What’s coming next
    • R code for categorical independent variables

simon-5501-07-template.qmd, 3

---
title: "Template for 5501-07 programming assignment"
author: "Steve Simon"
format: 
  html:
    embed-resources: true
date: 2024-09-25
---

There is a [data dictionary][dd] that provides more details about the data. The program was written by Steve Simon on 2024-09-02 and is placed in the public domain.

[dd]: https://github.com/pmean/datasets/blob/master/fev.yaml

Break #4

  • What you have learned
    • R code for categorical independent variables
  • What’s coming next
    • Diagnostic plots and multicollinearity

Diagnostic plots and multicollinearity

Break #5

  • What you have learned
    • Diagnostic plots and multicollinearity
  • What’s coming next
    • R code for diagnostic plots and multicollinearity

simon-5501-07-template.qmd, 4

## Reading the data

Here is the code to read the data and show a glimpse. 

```{r read}
fev <- read_csv(
  file="../data/fev.csv",
  col_names=fev_names,
  col_types="nnncc")
glimpse(fev)
```

Break #6

  • What you have learned
    • R code for diagnostic plots and multicollinearity
  • What’s coming next
    • Your homework

simon-5501-07-directions.qmd, 1

---
title: "Directions for 5501-01 programming assignment"
author: "Steve Simon"
format: 
  html:
    embed-resources: true
date: 2024-08-18
---

This code is placed in the public domain.

simon-5501-07-directions.qmd, 2

## Setup

-   Download the [template][tem]
    -   Store it in your src folder
-   Modify the file name
    -   Use your last name instead of "simon"
-   Modify the documentation header
    -   Add your name to the author field
    -   Optional: change the copyright statement
-   Download the [data file][dat]
    -   Store it in your data folder

[tem]: https://github.com/pmean/classes/blob/master/biostats-1/01/src/simon-5501-01-template.qmd
[dat]: https://github.com/pmean/datasets/blob/master/albuquerque-housing.csv
    

simon-5501-07-directions.qmd, 3

## Interpret the output

I provided an interpretation for mean
price. Provide a similar interpretation 
for the other three means. Be sure to

-   use a descriptive name
-   include the units of measurement, if appropriate
-   round to two or three significant digits
-   use comma separators for numbers 1,000 or larger

simon-5501-07-directions.qmd, 4

## Your submission

-   Save the output in html format
-   Convert it to pdf format.
-   Make sure that the pdf file includes
    -   Your last name
    -   The number of this course
    -   The number of this module
-   Upload the file

Summary

  • What you have learned
    • Multiple linear regression
    • R code for multiple linear regression
    • Categorical independent variables
    • R code for categorical independent variables
    • Diagnostic plots and multicollinearity
    • R code for diagnostic plots and multicollinearity
    • Your homework